With the lyrics dataset from the billborad r package, I make wordcoulds for each decade of the most frequently used words in the top 100 songs on billboard chart. We can somehow get taste of the change of sentiment through the decades with more curse words appearing in word cloud for more recent tracks. The order of the wordcloud is from the 60s to the 10s.
The basic dataset I start with is the spotify_track_data from the billboard r package, which contains the Billboard hot 100 songs from 1960 to 2015, as well as their musical traits such as tempo and key.
In order to capture sentiment from the lyrics I downloaded lyrics data using the genius package and compared the lyrics to the AFINN lexicon. The AFINN lexicon assigns a score that runs between -5 and 5 to words in the lyrics, where negative scores indicates negative sentiment and positive scores indicating positive sentiment. The overall lyrical sentiment of a track is recorded by adding scores of all the words in its lyrics. The code for creating this overall sentiment for all the tracks is displayed in the appendix.
There is connection between lyrics and musical traits of a track. Oftentimes one would assume that musical deliveries and lyrical sentiments would align with each other. However, one cannot rule out the fact that sometimes artists would exploit this assumption and choose to contrast these two aspects of modern pop music in oder to achieve artistic effect such as juxtaposition or sarcasm. I plan to run regression model on the relationship of the aformentioned two and would like to see how well the fit can be.
With the dataset I have, I would also take into consideration the effect of the different prevailing music type in different decades.
| Decade | Dominant Style |
|---|---|
| 60s | R&B, Folk Rock |
| 70s | Disco/Dance, Punk |
| 80s | Dance-Pop, Hip Hop |
| 90s | Pop, Rap, Alternative Rock, Techno |
| 00s | Hip Hop, Emo, Pop/Teen Pop |
| 10s | Hip Hop, Pop, Rock |
Below shows the density of lyrical sentiment of billboard songs for different decades. As time goes by, the distribution of lyrical sentiment is less concentrated and more spreadout. But still, they reach highest density at somewhere between 10 to 20.
In order to select predictors for the model, I first run stepwise selection on linear regression. I put in all the variables which I believe would connect to sentiment on the musical side for the initial model. This function selects variables according to AIC and returns a model with speechiness, instrumentalness, and valence.
##
## Call:
## lm(formula = sentiment ~ speechiness + instrumentalness + valence,
## data = track_data1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -508.06 -18.40 -4.45 14.74 549.18
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.427 1.723 6.631 3.73e-11 ***
## speechiness -81.571 8.809 -9.260 < 2e-16 ***
## instrumentalness -12.309 5.417 -2.273 0.0231 *
## valence 16.689 2.550 6.544 6.68e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 40.66 on 4460 degrees of freedom
## Multiple R-squared: 0.02645, Adjusted R-squared: 0.02579
## F-statistic: 40.39 on 3 and 4460 DF, p-value: < 2.2e-16
Speechiness captures the presence of spoken words in a track. The more exclusively speech-like the recording, the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, including cases such as rap. Values below 0.33 most likely represent music and other non-speech-like tracks.
From the density plot below, one can detect the change of speechiness density. Moving from the 60s to 10s, the density exhibits larger variance and the speechiness value with maximum density gradually moves right. It might be explained by the prevalence of Hip-Hop/Rap music starting from the 90s.
| Decade | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| 60s | 0.0224 | 0.03080 | 0.0360 | 0.048300 | 0.540 |
| 70s | 0.0228 | 0.03130 | 0.0377 | 0.052475 | 0.613 |
| 80s | 0.0215 | 0.03025 | 0.0362 | 0.047000 | 0.255 |
| 90s | 0.0228 | 0.03090 | 0.0391 | 0.064000 | 0.464 |
| 00s | 0.0236 | 0.03605 | 0.0594 | 0.147500 | 0.576 |
| 10s | 0.0244 | 0.03850 | 0.0520 | 0.091600 | 0.516 |
The value of instrumentalness represents the amount of vocals in the song. The closer it is to 1.0, the more instrumental the song is. As shown in the table, the mean and median of the instrumentalness score through the decades becomes smaller, with their range also shrinking. The tracks come to be more vocal, perhaps as hip-hop muisc merges to mainstream and influences other genres.
| Decade | Min | Median | Mean | Max |
|---|---|---|---|---|
| 60s | 0 | 5.30e-06 | 0.0510 | 0.984 |
| 70s | 0 | 8.50e-05 | 0.0345 | 0.944 |
| 80s | 0 | 3.28e-05 | 0.0198 | 0.898 |
| 90s | 0 | 6.30e-06 | 0.0225 | 0.974 |
| 00s | 0 | 0.00e+00 | 0.0064 | 0.738 |
| 10s | 0 | 0.00e+00 | 0.0034 | 0.680 |
Valence is a Spotify measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence should sound more happy, cheerful, or euphoric, while tracks with low valence should sound more negative (sad, depressed, angry). But is it a good measurement?
| year | track_name | artist_name | valence | sentiment |
|---|---|---|---|---|
| 1968 | Simon Says | 1910 Fruitgum Company | 0.985 | 22 |
| 1983 | She Works Hard For The Money | Donna Summer | 0.985 | -8 |
| 1979 | What A Fool Believes | The Doobie Brothers | 0.984 | -6 |
| 1973 | Rockin’ Pneumonia & The Boogie Woogie Flu | Huey “Piano” Smith | 0.982 | -14 |
| 1979 | September | Earth, Wind & Fire | 0.981 | 25 |
| 1987 | C’est La Vie | Robbie Nevil | 0.979 | 4 |
| 1970 | Hitchin’ a Ride | Vanity Fare | 0.978 | 0 |
| 1977 | Dancin’ Man | Q | 0.978 | 9 |
| 1961 | Let’s Twist Again | Chubby Checker | 0.977 | 29 |
| 1971 | Put Your Hand in the Hand | Ocean | 0.977 | 5 |
I listened to the song with the highest valience, which is Simon Says by 1910 Fruitgum Company, which turned out to be not so much of a positive track. It is upbeat but I would definitely not describe it as exceedingly cheerful. Take a look at its lyrics:
I’d like to play a game, That is so much fun, And it’s not so very hard to do, The name of the game is Simple Simon says, And I would like for you to play it to,
Put your hands in the air, Simple Simon says, Shake them all about, Simple Simon says, Do it when Simon says, Simple Simon says, And you will never be out. …
Does it convey exceptionally cheerful message? Not really.
Taking into account the basic sentimental change through the decades, I fit a Bayesian linear model with ramdom intercept(with group variable decade).
##
## Model Info:
## function: stan_lmer
## family: gaussian [identity]
## formula: sentiment ~ speechiness + instrumentalness + valence + (1 | decade)
## algorithm: sampling
## sample: 4000 (posterior sample size)
## priors: see help('prior_summary')
## observations: 4464
## groups: decade (6)
##
## Estimates:
## mean sd 10% 50% 90%
## (Intercept) 10.9 2.4 8.0 11.0 13.8
## speechiness -83.8 9.3 -95.8 -84.0 -71.8
## instrumentalness -11.8 5.3 -18.6 -11.7 -5.0
## valence 17.4 2.6 14.1 17.4 20.9
## b[(Intercept) decade:00s] 0.5 2.0 -1.8 0.5 3.0
## b[(Intercept) decade:10s] -1.8 2.2 -4.5 -1.7 0.7
## b[(Intercept) decade:60s] -2.9 2.1 -5.6 -2.7 -0.4
## b[(Intercept) decade:70s] 1.2 2.0 -1.1 1.2 3.7
## b[(Intercept) decade:80s] -0.7 2.0 -3.0 -0.6 1.5
## b[(Intercept) decade:90s] 3.3 2.1 0.8 3.2 5.9
## sigma 40.6 0.4 40.1 40.6 41.1
## Sigma[decade:(Intercept),(Intercept)] 15.7 20.3 2.7 9.4 34.5
##
## Fit Diagnostics:
## mean sd 10% 50% 90%
## mean_PPD 15.9 0.9 14.8 15.9 17.0
##
## The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
##
## MCMC diagnostics
## mcse Rhat n_eff
## (Intercept) 0.1 1.0 1370
## speechiness 0.2 1.0 3692
## instrumentalness 0.1 1.0 5158
## valence 0.0 1.0 4581
## b[(Intercept) decade:00s] 0.1 1.0 1193
## b[(Intercept) decade:10s] 0.1 1.0 1111
## b[(Intercept) decade:60s] 0.1 1.0 1089
## b[(Intercept) decade:70s] 0.1 1.0 1246
## b[(Intercept) decade:80s] 0.1 1.0 1077
## b[(Intercept) decade:90s] 0.1 1.0 1267
## sigma 0.0 1.0 4115
## Sigma[decade:(Intercept),(Intercept)] 0.6 1.0 1180
## mean_PPD 0.0 1.0 4203
## log-posterior 0.1 1.0 942
##
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
## 5% 95%
## (Intercept) 7.0473566 14.6792576
## speechiness -98.9085299 -68.0270666
## instrumentalness -20.3603179 -3.0462093
## valence 13.0750925 21.7930296
## b[(Intercept) decade:00s] -2.6205808 3.8745464
## b[(Intercept) decade:10s] -5.5962987 1.4574199
## b[(Intercept) decade:60s] -6.5402074 0.2503934
## b[(Intercept) decade:70s] -1.8280063 4.5782341
## b[(Intercept) decade:80s] -3.8466644 2.3418956
## b[(Intercept) decade:90s] 0.1193954 6.8917443
## sigma 39.9121726 41.2742707
## Sigma[decade:(Intercept),(Intercept)] 1.7669265 51.8026719
The fixed effect from speechiness is -83.9554, which indicates that the more speech-like the track is, the lyrics is expected to be more negative. It aligns with the expectation that hip-hop/rap music tend to be emotionally negative.
The fixed effect from instrumentalness is -11.6668, a negative number indicating that the more instrumental the track is, the lyrics is expected to be more negative. It expains the hip-hop/rap music tend to be emotionally negative.
Taking a look at the regression coefficient for valence, which is 17.3762, it is a positive number implying positive association between instrumental emotion and lyrical sentiment. Considering that valence is a measurement calculated by Spotify, it seems that it captures general positiveness of tracks but it is not advisable to look at it on its own when determing the sentiment of a track.
The random intercept reflects different base sentiment across the decades.
##
## Computed from 4000 by 4464 log-likelihood matrix
##
## Estimate SE
## elpd_loo -22878.7 164.2
## p_loo 20.8 5.2
## looic 45757.4 328.5
## ------
## Monte Carlo SE of elpd_loo is 0.1.
##
## Pareto k diagnostic values:
## Count Pct. Min. n_eff
## (-Inf, 0.5] (good) 4462 100.0% 1086
## (0.5, 0.7] (ok) 2 0.0% 114
## (0.7, 1] (bad) 0 0.0% <NA>
## (1, Inf) (very bad) 0 0.0% <NA>
##
## All Pareto k estimates are ok (k < 0.7).
## See help('pareto-k-diagnostic') for details.
One does not observe large the Pareto k diagnostic values, which would indicate model misspecification.
Looking at the posterior predictive check plot, one can conclude that this is really not a model for prediction.
In a nutshell, the association between lyrics sentiment score and music traits is confirmed through the model. However, explaining lyrics sentiment with just musical traits, or at least with the musical traits accessible in the spotify dataset, is limited.
When generate mood playlist, it is a good start point to check the mood of the track on both the musical and lyrical side.
# data(spotify_track_data)
#
# sentiment=rep(NA, 5497)
# for (i in 1:5497){
# track <- tribble(
# ~artist, ~track,
# spotify_track_data$artist_name[i], spotify_track_data$track_name[i])
#
# lyrics=track %>%
# add_genius(artist, track, type = "lyrics")
#
# if (length(lyrics$track)!=0){
# lyrics1=lyrics %>% unnest_tokens(word, lyric) %>% inner_join(get_sentiments("afinn"))
# sentiment[i]=sum(lyrics1$value)
# }
#
# }
# track_data=cbind(spotify_track_data, sentiment)
# write.csv(track_data,"track_data.csv",row.names = FALSE)